Enable minion's IPC channel to aggregate results from spawned jobber processes #61468

devkits · 2022-01-17T02:52:09Z

What does this PR do?

Enable minion's IPC channel to aggregate results from spawned jobber processes. Use a long-running request channel in the minion parent process to communicate job results back to the master via broker-based or broker-less transport.

This is a necessary optimization at scale for transports that prefer a sustained long-running connection because connection create/dispose operations are expensive. The working assumption is that this change benefits all supported transports.

Testing Done:

this tests provide coverage for this use case:
.../salt/tests/pytests/integration/minion.*

Previous Behavior

Minion would spawn a process that would run a job and create/dispose a new transport connection just to communicate one job result. This does not scale well for transports that prefer persistent connections because connection operations are expensive.

New Behavior

Minion would spawn a process that would run a job and communicate result via an IPC channel to the parent minion process. Parent minion process would use a long-running connection to communicate result back to master. This approach scales better because it reduces connection churn when thousands of minions are responding to the master.

Fixes: #61274

…processes. Use a long-running request channel in the minion parent process to communicate job results back to the master via broker-based or broker-less transport. This is a necessary optimization for transports that prefer a sustained long-running connection because connection create/dispose operations are expensive. The working assumption is that this change benefits all supported transports. Testing Done: * this tests provide coverage for this use case: .../salt/tests/pytests/integration/minion.*

devkits · 2022-01-18T00:20:12Z

re-run full all

salt/minion.py

jfindlay · 2022-01-19T19:33:45Z

You might consider also updating https://github.com/saltstack/salt/blob/master/doc/topics/development/architecture.rst#minion-job-flow.

devkits · 2022-01-20T20:47:29Z

You might consider also updating https://github.com/saltstack/salt/blob/master/doc/topics/development/architecture.rst#minion-job-flow.

Thanks. I will update https://github.com/saltstack/salt/blob/master/doc/topics/development/architecture.rst#minion-job-flow shortly.

devkits · 2022-01-21T14:53:43Z

re-run full all

…r events intended for the master's ReqServer. Tag these specific events with a specific tag "__master_req_channel_payload" Update the architecture flow accordingly: https://github.com/saltstack/salt/blob/master/doc/topics/development/architecture.rst#minion-job-flow Testing Done: - updated unit tests, - tested manually with zeromq

dwoz · 2022-02-04T00:05:40Z

@devkits This test needs to be fixed. Otherwise, the PR looks good.

devkits · 2022-02-08T14:49:52Z

Thanks. I can reproduce the test failure locally. Looks like the fix may need be in the TCP transport itself. I will try to update the PR with the fix.

garethgreenaway · 2022-02-09T18:01:55Z

@devkits Can you please add a changelog for this PR? Thanks!

welcome · 2022-02-09T21:27:48Z

Congratulations on your first PR being merged! 🎉

Since [1] minions now return all returns to all masters, instead of just the master that spawned the job. The upstream change in behaviour overloads our global masters making them unusable, so this aims to revert to the previous behaviour whilst maintaining the single-channel return improvements also introduced in [1]. [1]: saltstack#61468 Upstream-bug: saltstack#62834 Signed-off-by: Joe Groocock <jgroocock@cloudflare.com>

…sc#1213257) This reverts commits: saltstack/salt@a99ffb5 saltstack/salt@80ae518 saltstack/salt@3c7e1ec saltstack/salt@171926c From this PR: saltstack/salt#61468 See: saltstack/salt#62959 (comment)

…sc#1213257) * Revert usage of long running REQ channel (bsc#1213960, bsc#1213630, bsc#1213257) This reverts commits: saltstack/salt@a99ffb5 saltstack/salt@80ae518 saltstack/salt@3c7e1ec saltstack/salt@171926c From this PR: saltstack/salt#61468 See: saltstack/salt#62959 (comment) * Revert "Fix linter" This reverts commit d09d2d3. * Revert "add a regression test" This reverts commit b2c32be. * Fix failing tests after reverting commits

devkits requested a review from a team as a code owner January 17, 2022 02:52

devkits requested review from garethgreenaway and removed request for a team January 17, 2022 02:52

devkits and others added 2 commits January 17, 2022 10:22

Fix minion unit tests, specifically .../tests/pytests/test_minion.py

5aa44c9

Merge branch 'master' into minion_job_completion_optimization

f55d78b

jfindlay reviewed Jan 19, 2022

View reviewed changes

salt/minion.py Show resolved Hide resolved

devkits and others added 2 commits January 26, 2022 17:28

Merge branch 'master' into minion_job_completion_optimization

fb012e7

garethgreenaway requested review from dwoz and s0undt3ch January 27, 2022 00:56

Ch3LL added the Phosphorus v3005.0 Release code name and version label Jan 27, 2022

Merge branch 'master' into minion_job_completion_optimization

86b6abd

dwoz mentioned this pull request Feb 2, 2022

[BUG] Memory Leak in EventPublisher process #61565

Open

frebib mentioned this pull request Feb 3, 2022

[BUG] v3004 leaks fds/pipes causing Too many open files crash #61521

Closed

6 tasks

Merge branch 'master' into minion_job_completion_optimization

283e8c3

dwoz added 3 commits February 8, 2022 18:05

Move retries to channel

5ba3885

Clean up cruft

5ada2e5

Pre commit fixes

987affd

dwoz previously approved these changes Feb 8, 2022

View reviewed changes

twangboy previously approved these changes Feb 8, 2022

View reviewed changes

dmurphy18 mentioned this pull request Feb 9, 2022

[FEATURE REQUEST] Salt Minion should cache ReqChannels. #61274

Closed

s0undt3ch previously approved these changes Feb 9, 2022

View reviewed changes

Add changelog

7215701

dwoz dismissed stale reviews from s0undt3ch, twangboy, and themself via 7215701 February 9, 2022 21:25

garethgreenaway merged commit df7617e into saltstack:master Feb 9, 2022

dwoz mentioned this pull request Feb 10, 2022

[TEST FAILURE] tests.pytests.failover.multimaster (zeromq) #61622

Closed

frebib mentioned this pull request Oct 7, 2022

[BUG] Minion sends return events to all masters #62834

Closed

frebib mentioned this pull request Aug 30, 2023

[BUG] Runners occasionally fail with "RuntimeError: dictionary changed size during iteration" #65082

Closed

9 tasks

meaksh mentioned this pull request Aug 31, 2023

Revert usage of long running REQ channel (bsc#1213960, bsc#1213630, bsc#1213257) openSUSE/salt#600

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable minion's IPC channel to aggregate results from spawned jobber processes #61468

Enable minion's IPC channel to aggregate results from spawned jobber processes #61468

devkits commented Jan 17, 2022 •

edited by dwoz

Loading

devkits commented Jan 18, 2022

jfindlay commented Jan 19, 2022

devkits commented Jan 20, 2022

devkits commented Jan 21, 2022

dwoz commented Feb 4, 2022

devkits commented Feb 8, 2022

garethgreenaway commented Feb 9, 2022

welcome bot commented Feb 9, 2022

Enable minion's IPC channel to aggregate results from spawned jobber processes #61468

Enable minion's IPC channel to aggregate results from spawned jobber processes #61468

Conversation

devkits commented Jan 17, 2022 • edited by dwoz Loading

What does this PR do?

Previous Behavior

New Behavior

devkits commented Jan 18, 2022

jfindlay commented Jan 19, 2022

devkits commented Jan 20, 2022

devkits commented Jan 21, 2022

dwoz commented Feb 4, 2022

devkits commented Feb 8, 2022

garethgreenaway commented Feb 9, 2022

welcome bot commented Feb 9, 2022

devkits commented Jan 17, 2022 •

edited by dwoz

Loading